105 research outputs found
A Study on Prevention of Non-Performing Assets of Chinese State-Owned Commercial Banks
For a long time, Chinese state-owned commercial banks have to face the actualities of large non-performing assets and a high rate of non-performing assets. Based on China’s national condition and referring to the US banking industry, this article makes proposals on preventing and controlling non-performing assets from four aspects: state-owned enterprise system, government regulation, credit risk management, and disposal of non-performing assets
Event-triggered communication for passivity and synchronisation of multi-weighted coupled neural networks with and without parameter uncertainties
A multi-weighted coupled neural networks (MWCNNs) model with event-triggered communication is studied here. On the one hand, the passivity of the presented network model is studied by utilising Lyapunov stability theory and some inequality techniques, and a synchronisation criterion based on the obtained output-strict passivity condition of MWCNNs with eventtriggered communication is derived. On the other hand, some robust passivity and robust synchronisation criteria based on output-strict passivity of the proposed network with uncertain parameters are presented. At last, two numerical examples are provided to testify the effectiveness of the output-strict passivity and robust synchronisation results
Cross-identity Video Motion Retargeting with Joint Transformation and Synthesis
In this paper, we propose a novel dual-branch Transformation-Synthesis
network (TS-Net), for video motion retargeting. Given one subject video and one
driving video, TS-Net can produce a new plausible video with the subject
appearance of the subject video and motion pattern of the driving video. TS-Net
consists of a warp-based transformation branch and a warp-free synthesis
branch. The novel design of dual branches combines the strengths of
deformation-grid-based transformation and warp-free generation for better
identity preservation and robustness to occlusion in the synthesized videos. A
mask-aware similarity module is further introduced to the transformation branch
to reduce computational overhead. Experimental results on face and dance
datasets show that TS-Net achieves better performance in video motion
retargeting than several state-of-the-art models as well as its single-branch
variants. Our code is available at https://github.com/nihaomiao/WACV23_TSNet.Comment: WACV 202
FakeLocator: Robust Localization of GAN-Based Face Manipulations
Full face synthesis and partial face manipulation by virtue of the generative
adversarial networks (GANs) and its variants have raised wide public concerns.
In the multi-media forensics area, detecting and ultimately locating the image
forgery has become an imperative task. In this work, we investigate the
architecture of existing GAN-based face manipulation methods and observe that
the imperfection of upsampling methods therewithin could be served as an
important asset for GAN-synthesized fake image detection and forgery
localization. Based on this basic observation, we have proposed a novel
approach, termed FakeLocator, to obtain high localization accuracy, at full
resolution, on manipulated facial images. To the best of our knowledge, this is
the very first attempt to solve the GAN-based fake localization problem with a
gray-scale fakeness map that preserves more information of fake regions. To
improve the universality of FakeLocator across multifarious facial attributes,
we introduce an attention mechanism to guide the training of the model. To
improve the universality of FakeLocator across different DeepFake methods, we
propose partial data augmentation and single sample clustering on the training
images. Experimental results on popular FaceForensics++, DFFD datasets and
seven different state-of-the-art GAN-based face generation methods have shown
the effectiveness of our method. Compared with the baselines, our method
performs better on various metrics. Moreover, the proposed method is robust
against various real-world facial image degradations such as JPEG compression,
low-resolution, noise, and blur.Comment: 16 pages, accepted to IEEE Transactions on Information Forensics and
Securit
Protect Federated Learning Against Backdoor Attacks via Data-Free Trigger Generation
As a distributed machine learning paradigm, Federated Learning (FL) enables
large-scale clients to collaboratively train a model without sharing their raw
data. However, due to the lack of data auditing for untrusted clients, FL is
vulnerable to poisoning attacks, especially backdoor attacks. By using poisoned
data for local training or directly changing the model parameters, attackers
can easily inject backdoors into the model, which can trigger the model to make
misclassification of targeted patterns in images. To address these issues, we
propose a novel data-free trigger-generation-based defense approach based on
the two characteristics of backdoor attacks: i) triggers are learned faster
than normal knowledge, and ii) trigger patterns have a greater effect on image
classification than normal class patterns. Our approach generates the images
with newly learned knowledge by identifying the differences between the old and
new global models, and filters trigger images by evaluating the effect of these
generated images. By using these trigger images, our approach eliminates
poisoned models to ensure the updated global model is benign. Comprehensive
experiments demonstrate that our approach can defend against almost all the
existing types of backdoor attacks and outperform all the seven
state-of-the-art defense methods with both IID and non-IID scenarios.
Especially, our approach can successfully defend against the backdoor attack
even when 80\% of the clients are malicious
GitFL: Adaptive Asynchronous Federated Learning using Version Control
As a promising distributed machine learning paradigm that enables
collaborative training without compromising data privacy, Federated Learning
(FL) has been increasingly used in AIoT (Artificial Intelligence of Things)
design. However, due to the lack of efficient management of straggling devices,
existing FL methods greatly suffer from the problems of low inference accuracy
and long training time. Things become even worse when taking various uncertain
factors (e.g., network delays, performance variances caused by process
variation) existing in AIoT scenarios into account. To address this issue, this
paper proposes a novel asynchronous FL framework named GitFL, whose
implementation is inspired by the famous version control system Git. Unlike
traditional FL, the cloud server of GitFL maintains a master model (i.e., the
global model) together with a set of branch models indicating the trained local
models committed by selected devices, where the master model is updated based
on both all the pushed branch models and their version information, and only
the branch models after the pull operation are dispatched to devices. By using
our proposed Reinforcement Learning (RL)-based device selection mechanism, a
pulled branch model with an older version will be more likely to be dispatched
to a faster and less frequently selected device for the next round of local
training. In this way, GitFL enables both effective control of model staleness
and adaptive load balance of versioned models among straggling devices, thus
avoiding the performance deterioration. Comprehensive experimental results on
well-known models and datasets show that, compared with state-of-the-art
asynchronous FL methods, GitFL can achieve up to 2.64X training acceleration
and 7.88% inference accuracy improvements in various uncertain scenarios
Towards Better Fairness-Utility Trade-off: A Comprehensive Measurement-Based Reinforcement Learning Framework
Machine learning is widely used to make decisions with societal impact such
as bank loan approving, criminal sentencing, and resume filtering. How to
ensure its fairness while maintaining utility is a challenging but crucial
issue. Fairness is a complex and context-dependent concept with over 70
different measurement metrics. Since existing regulations are often vague in
terms of which metric to use and different organizations may prefer different
fairness metrics, it is important to have means of improving fairness
comprehensively. Existing mitigation techniques often target at one specific
fairness metric and have limitations in improving multiple notions of fairness
simultaneously. In this work, we propose CFU (Comprehensive Fairness-Utility),
a reinforcement learning-based framework, to efficiently improve the
fairness-utility trade-off in machine learning classifiers. A comprehensive
measurement that can simultaneously consider multiple fairness notions as well
as utility is established, and new metrics are proposed based on an in-depth
analysis of the relationship between different fairness metrics. The reward
function of CFU is constructed with comprehensive measurement and new metrics.
We conduct extensive experiments to evaluate CFU on 6 tasks, 3 machine learning
models, and 15 fairness-utility measurements. The results demonstrate that CFU
can improve the classifier on multiple fairness metrics without sacrificing its
utility. It outperforms all state-of-the-art techniques and has witnessed a
37.5% improvement on average
LivelySpeaker: Towards Semantic-Aware Co-Speech Gesture Generation
Gestures are non-verbal but important behaviors accompanying people's speech.
While previous methods are able to generate speech rhythm-synchronized
gestures, the semantic context of the speech is generally lacking in the
gesticulations. Although semantic gestures do not occur very regularly in human
speech, they are indeed the key for the audience to understand the speech
context in a more immersive environment. Hence, we introduce LivelySpeaker, a
framework that realizes semantics-aware co-speech gesture generation and offers
several control handles. In particular, our method decouples the task into two
stages: script-based gesture generation and audio-guided rhythm refinement.
Specifically, the script-based gesture generation leverages the pre-trained
CLIP text embeddings as the guidance for generating gestures that are highly
semantically aligned with the script. Then, we devise a simple but effective
diffusion-based gesture generation backbone simply using pure MLPs, that is
conditioned on only audio signals and learns to gesticulate with realistic
motions. We utilize such powerful prior to rhyme the script-guided gestures
with the audio signals, notably in a zero-shot setting. Our novel two-stage
generation framework also enables several applications, such as changing the
gesticulation style, editing the co-speech gestures via textual prompting, and
controlling the semantic awareness and rhythm alignment with guided diffusion.
Extensive experiments demonstrate the advantages of the proposed framework over
competing methods. In addition, our core diffusion-based generative model also
achieves state-of-the-art performance on two benchmarks. The code and model
will be released to facilitate future research.Comment: Accepted by ICCV 202
Form-NLU: Dataset for the Form Language Understanding
Compared to general document analysis tasks, form document structure
understanding and retrieval are challenging. Form documents are typically made
by two types of authors; A form designer, who develops the form structure and
keys, and a form user, who fills out form values based on the provided keys.
Hence, the form values may not be aligned with the form designer's intention
(structure and keys) if a form user gets confused. In this paper, we introduce
Form-NLU, the first novel dataset for form structure understanding and its key
and value information extraction, interpreting the form designer's intent and
the alignment of user-written value on it. It consists of 857 form images, 6k
form keys and values, and 4k table keys and values. Our dataset also includes
three form types: digital, printed, and handwritten, which cover diverse form
appearances and layouts. We propose a robust positional and logical
relation-based form key-value information extraction framework. Using this
dataset, Form-NLU, we first examine strong object detection models for the form
layout understanding, then evaluate the key information extraction task on the
dataset, providing fine-grained results for different types of forms and keys.
Furthermore, we examine it with the off-the-shelf pdf layout extraction tool
and prove its feasibility in real-world cases.Comment: Accepted by SIGIR 202
- …